Introduction

Background Information

Honeybees are very important organisms to the world at large. Honeybees are important pollinators of flowering plants, and thus are essential to the agriculture industry. Despite their vital importance, honeybee populations have been declining for nearly thirty years. One of the main reasons for this decline in honeybee numbers is the Varroa mite (Varroa destructor), an external parasite of honeybees. Varroa mites are responsible for causing Varroosis in honeybees, weakening adult bees by feeding on their haemolymph and thereby increasing their susceptibility to other diseases. The current protocol in place to help rid honeybee colonies of Varroa mites is through miticides. However, this method of pest control is not sustainable as the use of miticides may further weaken the honeybees or contaminate hive products. More importantly, an increase in resistance to miticides by Varroa mites has been seen. Consequently, it is imperative to develop a new means of controlling these pests to increase honeybee populations in the long term.
Varroa mites on adult male honey bee (Apis melifera) (genetic literacy project.org)

Research Objectives

This research aims to examine various odorants and their effects on both Varroa mites and honeybees to analyze their attractant or repellent properties on the two species. Additionally, the effects of the odorants on honeybee and Varroa mite behaviour, as well as effects on in-hive production, will be examined.

The outcome of this research project will be to implement the use of odorants as a more sustainable means of controlling Varroa mite populations, without the use of toxic pesticides. Successful implementation of the results of this study could help slow the decline of honeybee populations worldwide by decreasing the presence of Varroa mites in honeybee colonies.

Progress

Over the summer, Mike conducted electrotarsograms (ETGs) on Varroa mites to examine their physiological sensitivity to several previously identified attractive odourants.

ETGs on Varroa mites were conducted using single odourants previously identified as evoking a response in Varroa mites, and were tested in four different concentrations (10ng, 1ng, 100μg, 10μg). These are typical concentrations used in ETG studies and often encompass concentrations close to what Varroa mites would encounter in their natural environment. Relative to a control stimulus, these reactions can be used to infer the relative importance of each odourant to Varroa mites in the hive environment.

Questions

The questions we want to answer from the dataset are as follows:
1. Are there mite trials that are outliers?
2. Are there odours which consistantly evoke a response?
3. Are there concentration-dependent trends?
4. Can we quantify and account for between-trial (between-mite) variability?

Loading Data

my.df <- read_csv('DataForR.csv')
## Parsed with column specification:
## cols(
##   trial = col_integer(),
##   odour = col_character(),
##   conc = col_double(),
##   percent = col_double(),
##   gb = col_character(),
##   normalized = col_double()
## )

Look at Data

str(my.df)
## Classes 'tbl_df', 'tbl' and 'data.frame':    1165 obs. of  6 variables:
##  $ trial     : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ odour     : chr  "2-Hydroxyhexanoic Acid (DCM)" "2-Hydroxyhexanoic Acid (DCM)" "2-Hydroxyhexanoic Acid (DCM)" "Methyl Palmitate" ...
##  $ conc      : num  0.01 0.1 1 0.01 0 10 0.01 1 0.01 0.1 ...
##  $ percent   : num  53.6 142.5 157.9 137.7 100 ...
##  $ gb        : chr  "Good" "Good" "Good" "Good" ...
##  $ normalized: num  13.3 129.8 136.7 127.4 100 ...
##  - attr(*, "spec")=List of 2
##   ..$ cols   :List of 6
##   .. ..$ trial     : list()
##   .. .. ..- attr(*, "class")= chr  "collector_integer" "collector"
##   .. ..$ odour     : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ conc      : list()
##   .. .. ..- attr(*, "class")= chr  "collector_double" "collector"
##   .. ..$ percent   : list()
##   .. .. ..- attr(*, "class")= chr  "collector_double" "collector"
##   .. ..$ gb        : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ normalized: list()
##   .. .. ..- attr(*, "class")= chr  "collector_double" "collector"
##   ..$ default: list()
##   .. ..- attr(*, "class")= chr  "collector_guess" "collector"
##   ..- attr(*, "class")= chr "col_spec"
summary(my.df)
##      trial          odour                conc           percent       
##  Min.   : 1.00   Length:1165        Min.   : 0.000   Min.   :  17.59  
##  1st Qu.: 9.00   Class :character   1st Qu.: 0.010   1st Qu.:  78.07  
##  Median :13.00   Mode  :character   Median : 0.100   Median : 100.00  
##  Mean   :12.22                      Mean   : 2.665   Mean   : 117.17  
##  3rd Qu.:16.00                      3rd Qu.: 1.000   3rd Qu.: 125.02  
##  Max.   :18.00                      Max.   :10.000   Max.   :2507.58  
##       gb              normalized     
##  Length:1165        Min.   :-368.60  
##  Class :character   1st Qu.:  71.91  
##  Mode  :character   Median : 100.00  
##                     Mean   :  89.80  
##                     3rd Qu.: 120.01  
##                     Max.   : 196.01
hist(my.df$normalized, 
     xlab="Normalized ETG Response", ylab="Frequency", 
     main="Frequency of Normalized ETG Responses")

p <- ggplot(data=my.df, aes(normalized))
p + geom_density()

Doing this produces a histogram of the percent reaction by Varroa mites. This is an easy way to see how the data is distributed.

Changing Trial number and Concentration to Factors

#change integer to factor for trials
my.df$trial<-factor(my.df$trial)

#lists the levels of factors
levels(my.df$trial)
##  [1] "1"  "2"  "4"  "5"  "6"  "7"  "8"  "9"  "10" "11" "12" "13" "14" "15"
## [15] "16" "17" "18"
#change number to factor for concentration "conc"
my.df$conc<-factor(my.df$conc)
levels(my.df$conc)
## [1] "0"    "0.01" "0.1"  "1"    "10"
# types of odours
my.df$odour<-factor(my.df$odour)
levels(my.df$odour)
##  [1] "1-Hexadecanol"                "2-Heptanol"                  
##  [3] "2-Heptanone"                  "2-Hydroxyhexanoic Acid (DCM)"
##  [5] "2-Nonanal"                    "Benzoic Acid"                
##  [7] "Butyric Acid"                 "Dodecyl Aldehyde"            
##  [9] "Ethyl Palmitate"              "Geraniol"                    
## [11] "Heptacosane"                  "Heptadecane"                 
## [13] "Hexadecanal"                  "Hexane"                      
## [15] "Methyl Oleate"                "Methyl Palmitate"            
## [17] "Octadecanol"                  "Octanoic Acid"               
## [19] "Palmitic Acid"                "trans-Nerolidol"

Creating a Faceted Plot for All Odours at All Concentrations

ggplot(my.df)+
  geom_boxplot(aes(x=conc,y= (normalized - 100), group= conc))+
  xlab("Concentration")+ylab("Response to Odour")+ggtitle("Faceted ETG Plots")+
  theme_bw(10)+
  facet_wrap(~odour)

Step1: Filter Out Bad Runs

We discovered that all runs with these odours were highly variable and produced messy plots, so we will remove them for now.

good_runs<- filter(my.df,
                    !(odour %in% c("1-Hexadecanol", "Methyl Palmitate", "Hexane",   "Palmitic Acid" )))

Step2: Normalize Scores at Zero Instead of 100

subsetted <- mutate(good_runs,
                    Zresponse = normalized - 100)

Look at Normalized Data Distribution

hist(subsetted$Zresponse,xlab="Normalized ETG Response",ylab="Frequency", main="Frequency of Normalized ETG Responses")

Step3: Filter Out Negative Responses

A negative response makes no sense physiologically, so this is an indication of a bad run. We remove these negative values from analysis by making them NA.

nonegatives <- subsetted %>%
  mutate(Zresponse= ifelse(Zresponse <0, NA, Zresponse)) 
str(nonegatives)
## Classes 'tbl_df', 'tbl' and 'data.frame':    706 obs. of  7 variables:
##  $ trial     : Factor w/ 17 levels "1","2","4","5",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ odour     : Factor w/ 20 levels "1-Hexadecanol",..: 4 4 4 15 12 12 12 2 5 5 ...
##  $ conc      : Factor w/ 5 levels "0","0.01","0.1",..: 2 3 4 5 2 3 4 3 2 3 ...
##  $ percent   : num  53.6 142.5 157.9 202.7 44.1 ...
##  $ gb        : chr  "Good" "Good" "Good" "Good" ...
##  $ normalized: num  13.3 129.8 136.7 150.7 -27 ...
##  $ Zresponse : num  NA 29.8 36.7 50.7 NA ...

Step4: Make the Plot

Now that we have manipulated the data in a way that is useful to us, we will make our faceted plot.

ggplot(nonegatives)+
  geom_boxplot(aes(x=conc,y= Zresponse, group= conc))+
  xlab("Concentration")+ylab("Normalized Percent Reaction")+ggtitle("Faceted ETG Plots")+
  theme_bw(10)+
  facet_wrap(~odour)
## Warning: Removed 366 rows containing non-finite values (stat_boxplot).

From this plot, we can see the overall trend in ETG responses from Varroa mites at each concentration for each compound tested.

Taking temperature and humidity into account

We are still waiting on better temperature and humidity data. We have been in touch with someone regarding this.

absoluteweather<- read.csv('absoluteweather.csv')

library(tidyverse)
absoluteweather$conc <- factor(absoluteweather$conc)
absoluteweather$trial <- factor(absoluteweather$trial)
str(absoluteweather)
## 'data.frame':    1165 obs. of  13 variables:
##  $ trial     : Factor w/ 17 levels "1","2","4","5",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ odour     : Factor w/ 20 levels "1-Hexadecanol",..: 4 4 4 16 14 15 1 16 12 12 ...
##  $ conc      : Factor w/ 5 levels "0","0.01","0.1",..: 2 3 4 2 1 5 2 4 2 3 ...
##  $ absamp    : num  3.02 8.09 9 7.95 5.91 ...
##  $ linearhex : num  5.64 5.67 5.7 5.77 5.91 ...
##  $ percent   : num  53.6 142.5 157.9 137.7 100 ...
##  $ normalized: num  13.3 129.8 136.7 127.4 100 ...
##  $ gb        : Factor w/ 2 levels "Bad","Good": 2 2 2 2 2 2 2 2 2 2 ...
##  $ humid     : num  69 69 69.1 69.2 69.3 ...
##  $ temp      : num  20.8 20.8 20.8 20.7 20.7 ...
##  $ dewp      : num  14.9 14.9 14.9 14.9 14.9 ...
##  $ hour      : num  2013 2014 2014 2016 2019 ...
##  $ min       : num  13.1 13.8 14.5 16 18.9 ...
ggplot(data = absoluteweather, mapping = aes(humid, absamp)) + 
  geom_jitter(aes(colour = absoluteweather$trial), width = 0.25) +
  ggtitle("Amplitude vs Humidity")

ggplot(data = absoluteweather, mapping = aes(temp, absamp)) + 
  geom_jitter(aes(colour = absoluteweather$trial), width = 0.25) +
  ggtitle("Amplitude vs Temperature")

ggplot(data = absoluteweather, mapping = aes(dewp, absamp)) + 
  geom_jitter(aes(colour = absoluteweather$trial), width = 0.25) +
  ggtitle("Amplitude vs Dewpoint")

ggplot(data = absoluteweather, mapping = aes(absamp))+
  geom_histogram(bins = 50)

# correlations
cor(absoluteweather$absamp, absoluteweather$temp)
## [1] -0.4868699
cov(absoluteweather$absamp, absoluteweather$temp)
## [1] -234.6016
corr_amp_tempr <- cor.test(x=absoluteweather$absamp, 
                           y=absoluteweather$temp, method = 'spearman')
## Warning in cor.test.default(x = absoluteweather$absamp, y = absoluteweather
## $temp, : Cannot compute exact p-value with ties
corr_amp_tempr
## 
##  Spearman's rank correlation rho
## 
## data:  absoluteweather$absamp and absoluteweather$temp
## S = 367540000, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##        rho 
## -0.3946761
corr_amp_humid <- cor.test(x=absoluteweather$absamp, 
                           y=absoluteweather$humid, method = 'spearman')
## Warning in cor.test.default(x = absoluteweather$absamp, y = absoluteweather
## $humid, : Cannot compute exact p-value with ties
corr_amp_humid
## 
##  Spearman's rank correlation rho
## 
## data:  absoluteweather$absamp and absoluteweather$humid
## S = 103650000, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##      rho 
## 0.606672
corr_amp_dewp<- cor.test(x=absoluteweather$absamp, 
                         y=absoluteweather$dewp, method = 'spearman')
## Warning in cor.test.default(x = absoluteweather$absamp, y = absoluteweather
## $dewp, : Cannot compute exact p-value with ties
corr_amp_dewp
## 
##  Spearman's rank correlation rho
## 
## data:  absoluteweather$absamp and absoluteweather$dewp
## S = 195510000, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.2580989
library(fifer)
## Loading required package: MASS
## 
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
## 
##     select
spearman.plot(absoluteweather$absamp, absoluteweather$temp)

weather_matrix <- data.matrix(absoluteweather[7:10], rownames.force = NA)
pairs(weather_matrix) # plots all the stuff simultaneiously

Absolute amplitude (corrected to Hexane blank) versus length of trial

absoluteweather <- read.csv('absoluteweather.csv')
str(absoluteweather)
## 'data.frame':    1165 obs. of  13 variables:
##  $ trial     : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ odour     : Factor w/ 20 levels "1-Hexadecanol",..: 4 4 4 16 14 15 1 16 12 12 ...
##  $ conc      : num  0.01 0.1 1 0.01 0 10 0.01 1 0.01 0.1 ...
##  $ absamp    : num  3.02 8.09 9 7.95 5.91 ...
##  $ linearhex : num  5.64 5.67 5.7 5.77 5.91 ...
##  $ percent   : num  53.6 142.5 157.9 137.7 100 ...
##  $ normalized: num  13.3 129.8 136.7 127.4 100 ...
##  $ gb        : Factor w/ 2 levels "Bad","Good": 2 2 2 2 2 2 2 2 2 2 ...
##  $ humid     : num  69 69 69.1 69.2 69.3 ...
##  $ temp      : num  20.8 20.8 20.8 20.7 20.7 ...
##  $ dewp      : num  14.9 14.9 14.9 14.9 14.9 ...
##  $ hour      : num  2013 2014 2014 2016 2019 ...
##  $ min       : num  13.1 13.8 14.5 16 18.9 ...
absoluteweather$trial <- as.factor(absoluteweather$trial)
absoluteweather$conc <- as.factor(absoluteweather$conc)

Step1: Normalize scores at zero

subsetted <- mutate(absoluteweather,
                    Zresponse = normalized - 100)

Step2: Filter out negative responses

nonegatives <- subsetted %>%
  mutate(Zresponse= ifelse(Zresponse <0, NA, Zresponse)) 
str(nonegatives$Zresponse)
##  num [1:1165] NA 29.8 36.7 27.4 0 ...

ETG amplitude plots faceted by trial

ggplot(data = nonegatives, aes(min, absamp))+
  geom_jitter(aes(colour = nonegatives$odour), width = 0.25)+
  xlab("Time")+ylab("Absamp")+ggtitle("Faceted ETG Plots")+
  theme_bw(10)+
  facet_wrap(~trial)

Cut out the Noisy Trails

goodtrials <- nonegatives[nonegatives$trial %in% c(1:8,11, 12), ]
str(goodtrials)
## 'data.frame':    453 obs. of  14 variables:
##  $ trial     : Factor w/ 17 levels "1","2","4","5",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ odour     : Factor w/ 20 levels "1-Hexadecanol",..: 4 4 4 16 14 15 1 16 12 12 ...
##  $ conc      : Factor w/ 5 levels "0","0.01","0.1",..: 2 3 4 2 1 5 2 4 2 3 ...
##  $ absamp    : num  3.02 8.09 9 7.95 5.91 ...
##  $ linearhex : num  5.64 5.67 5.7 5.77 5.91 ...
##  $ percent   : num  53.6 142.5 157.9 137.7 100 ...
##  $ normalized: num  13.3 129.8 136.7 127.4 100 ...
##  $ gb        : Factor w/ 2 levels "Bad","Good": 2 2 2 2 2 2 2 2 2 2 ...
##  $ humid     : num  69 69 69.1 69.2 69.3 ...
##  $ temp      : num  20.8 20.8 20.8 20.7 20.7 ...
##  $ dewp      : num  14.9 14.9 14.9 14.9 14.9 ...
##  $ hour      : num  2013 2014 2014 2016 2019 ...
##  $ min       : num  13.1 13.8 14.5 16 18.9 ...
##  $ Zresponse : num  NA 29.8 36.7 27.4 0 ...
remove(subsetted)

Histogram of the normalized data

ggplot(data = nonegatives, mapping = aes(Zresponse))+
  geom_histogram(bins = 50)
## Warning: Removed 577 rows containing non-finite values (stat_bin).

Analysis of Variance (ANOVA)

plot(aov_out <- aov(Zresponse~odour*conc*trial, data = goodtrials))
## Warning: not plotting observations with leverage one:
##   1, 2, 5, 6, 8, 12, 13, 14, 19, 21, 22, 23, 24, 25, 26, 28, 31, 32, 33, 38, 39, 40, 43, 45, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 64, 65, 67, 68, 69, 70, 71, 85, 86, 87, 88, 89, 90, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 116, 117, 118, 119, 120, 125, 126, 127, 128, 129, 131, 132, 136, 137, 140, 141, 146, 153, 157, 158, 159, 162, 165, 166, 167, 168, 175, 176, 177, 178, 179, 181, 182, 183, 184, 185, 189, 190, 191, 192, 193, 194, 195, 196, 197, 203, 204, 205, 206, 207, 208, 209, 210, 215, 216, 217, 220

## Warning: not plotting observations with leverage one:
##   1, 2, 5, 6, 8, 12, 13, 14, 19, 21, 22, 23, 24, 25, 26, 28, 31, 32, 33, 38, 39, 40, 43, 45, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 64, 65, 67, 68, 69, 70, 71, 85, 86, 87, 88, 89, 90, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 116, 117, 118, 119, 120, 125, 126, 127, 128, 129, 131, 132, 136, 137, 140, 141, 146, 153, 157, 158, 159, 162, 165, 166, 167, 168, 175, 176, 177, 178, 179, 181, 182, 183, 184, 185, 189, 190, 191, 192, 193, 194, 195, 196, 197, 203, 204, 205, 206, 207, 208, 209, 210, 215, 216, 217, 220

## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produced
## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produced

summary(aov_out)
##                  Df Sum Sq Mean Sq F value   Pr(>F)    
## odour            18  34745    1930   6.580 3.18e-09 ***
## conc              3    770     257   0.875   0.4581    
## trial             8  36778    4597  15.671 4.95e-13 ***
## odour:conc       36   8820     245   0.835   0.7195    
## odour:trial      46  20590     448   1.526   0.0538 .  
## conc:trial       22   5605     255   0.869   0.6331    
## odour:conc:trial 20   3987     199   0.680   0.8329    
## Residuals        71  20829     293                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 228 observations deleted due to missingness

Messing Around With Models

library(tidyverse)
library(modelr)
options(na.action = na.warn)
ggplot(goodtrials, aes(trial, absamp))+
  geom_point(aes(colour=odour))

That plot looks too spread out, so let’s log transform the data and try again

goodtrials$logging <- log(goodtrials$absamp)
ggplot(goodtrials, aes(trial, logging))+
  geom_jitter(aes(colour=odour))+
  xlab("Trial")+ylab("Log(Absolute Amplitude of Response)")

gucci<- filter(goodtrials,
                    !(odour %in% c("1-Hexadecanol", "Methyl Palmitate", "Hexane", "Palmitic Acid" )))

Poisson GLM

These aren’t count data, so we don’t anticipate the poisson model to fit very well.

mod01 <- glm(I(as.integer(gucci$Zresponse))~conc+odour, data = gucci, poisson)
## Warning: Dropping 160 rows with missing values
par(mfrow=c(2, 2))
summary(mod01)
## 
## Call:
## glm(formula = I(as.integer(gucci$Zresponse)) ~ conc + odour, 
##     family = poisson, data = gucci)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -8.3223  -3.4985  -0.7304   2.4474   8.2475  
## 
## Coefficients:
##                                   Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                        2.92806    0.07993  36.633  < 2e-16 ***
## conc0.1                            0.12707    0.05345   2.377  0.01744 *  
## conc1                              0.22926    0.05261   4.358 1.31e-05 ***
## conc10                            -0.04868    0.05964  -0.816  0.41436    
## odour2-Heptanone                   0.25672    0.11962   2.146  0.03186 *  
## odour2-Hydroxyhexanoic Acid (DCM)  0.24729    0.09133   2.708  0.00678 ** 
## odour2-Nonanal                     0.35450    0.08657   4.095 4.22e-05 ***
## odourBenzoic Acid                 -0.69356    0.13930  -4.979 6.40e-07 ***
## odourButyric Acid                 -0.50406    0.16233  -3.105  0.00190 ** 
## odourDodecyl Aldehyde              0.03866    0.14195   0.272  0.78532    
## odourEthyl Palmitate              -0.17133    0.27055  -0.633  0.52656    
## odourGeraniol                     -0.71717    0.16899  -4.244 2.20e-05 ***
## odourHeptacosane                  -0.03562    0.14921  -0.239  0.81134    
## odourHeptadecane                   0.52008    0.08242   6.310 2.79e-10 ***
## odourHexadecanal                   0.60396    0.08908   6.780 1.20e-11 ***
## odourMethyl Oleate                 0.66536    0.08303   8.013 1.12e-15 ***
## odourOctadecanol                  -0.57679    0.32639  -1.767  0.07719 .  
## odourOctanoic Acid                -0.05128    0.10461  -0.490  0.62398    
## odourtrans-Nerolidol              -0.62392    0.16655  -3.746  0.00018 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 2142.0  on 108  degrees of freedom
## Residual deviance: 1696.7  on  90  degrees of freedom
##   (160 observations deleted due to missingness)
## AIC: 2228.1
## 
## Number of Fisher Scoring iterations: 5

Gaussian GLM

mod02 <- glm(Zresponse~conc+odour, data = gucci)
## Warning: Dropping 160 rows with missing values
par(mfrow=c(2, 2))
summary(mod02)
## 
## Call:
## glm(formula = Zresponse ~ conc + odour, data = gucci)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -36.565  -14.599   -2.835   11.036   56.098  
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)  
## (Intercept)                        18.4157     8.6300   2.134   0.0356 *
## conc0.1                             3.4408     6.2746   0.548   0.5848  
## conc1                               6.3403     6.3140   1.004   0.3180  
## conc10                             -0.9955     6.7414  -0.148   0.8829  
## odour2-Heptanone                    6.3594    13.5516   0.469   0.6400  
## odour2-Hydroxyhexanoic Acid (DCM)   6.1394    10.0809   0.609   0.5440  
## odour2-Nonanal                      8.9422     9.6888   0.923   0.3585  
## odourBenzoic Acid                  -9.5201    11.3728  -0.837   0.4048  
## odourButyric Acid                  -7.1927    13.6944  -0.525   0.6007  
## odourDodecyl Aldehyde               0.5698    15.0625   0.038   0.9699  
## odourEthyl Palmitate               -1.9607    24.3939  -0.080   0.9361  
## odourGeraniol                     -11.1482    13.5336  -0.824   0.4123  
## odourHeptacosane                   -0.6370    15.2993  -0.042   0.9669  
## odourHeptadecane                   13.9834     9.4044   1.487   0.1405  
## odourHexadecanal                   17.0364    10.6542   1.599   0.1133  
## odourMethyl Oleate                 19.4371     9.8322   1.977   0.0511 .
## odourOctadecanol                   -7.2124    24.3939  -0.296   0.7682  
## odourOctanoic Acid                 -1.1539    10.9092  -0.106   0.9160  
## odourtrans-Nerolidol               -9.5458    13.6894  -0.697   0.4874  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 521.8397)
## 
##     Null deviance: 57674  on 108  degrees of freedom
## Residual deviance: 46966  on  90  degrees of freedom
##   (160 observations deleted due to missingness)
## AIC: 1010.5
## 
## Number of Fisher Scoring iterations: 2
plot(mod02)
## Warning: not plotting observations with leverage one:
##   140, 217
## Warning: not plotting observations with leverage one:
##   140, 217

Assessment of the Gaussian GLM

This model appears to fit the data better than the poisson model.

Binomial GLM

These data aren’t binomial, but we wanted to fit a binomial model anyway just to see what it would look like.

bindf <- gucci %>%
  mutate(BinZresponse= ifelse(Zresponse > 0, 1, 0))
mod03 <- glm(bindf$BinZresponse~conc+odour, data = bindf, binomial)
## Warning: Dropping 160 rows with missing values
## Warning: glm.fit: algorithm did not converge
par(mfrow=c(2, 2))
summary(mod03)
## 
## Call:
## glm(formula = bindf$BinZresponse ~ conc + odour, family = binomial, 
##     data = bindf)
## 
## Deviance Residuals: 
##       Min         1Q     Median         3Q        Max  
## 2.409e-06  2.409e-06  2.409e-06  2.409e-06  2.409e-06  
## 
## Coefficients:
##                                     Estimate Std. Error z value Pr(>|z|)
## (Intercept)                        2.657e+01  1.345e+05       0        1
## conc0.1                            1.045e-08  9.782e+04       0        1
## conc1                              9.134e-08  9.843e+04       0        1
## conc10                             2.327e-07  1.051e+05       0        1
## odour2-Heptanone                   2.105e-07  2.113e+05       0        1
## odour2-Hydroxyhexanoic Acid (DCM)  6.163e-07  1.572e+05       0        1
## odour2-Nonanal                     2.031e-07  1.510e+05       0        1
## odourBenzoic Acid                 -3.005e-07  1.773e+05       0        1
## odourButyric Acid                  1.590e-07  2.135e+05       0        1
## odourDodecyl Aldehyde             -1.590e-07  2.348e+05       0        1
## odourEthyl Palmitate               4.537e-06  3.803e+05       0        1
## odourGeraniol                     -3.584e-07  2.110e+05       0        1
## odourHeptacosane                   2.903e-07  2.385e+05       0        1
## odourHeptadecane                   2.075e-07  1.466e+05       0        1
## odourHexadecanal                  -1.734e-07  1.661e+05       0        1
## odourMethyl Oleate                 2.105e-07  1.533e+05       0        1
## odourOctadecanol                   4.537e-06  3.803e+05       0        1
## odourOctanoic Acid                 2.498e-07  1.701e+05       0        1
## odourtrans-Nerolidol               2.402e-07  2.134e+05       0        1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 0.0000e+00  on 108  degrees of freedom
## Residual deviance: 6.3237e-10  on  90  degrees of freedom
##   (160 observations deleted due to missingness)
## AIC: 38
## 
## Number of Fisher Scoring iterations: 25
plot(mod03)
## Warning: not plotting observations with leverage one:
##   140, 217
## Warning: not plotting observations with leverage one:
##   140, 217

Assessment of the Binomial GLM

This model appears to fit the data terribly

GLMM

Loading the required package

library(lme4)
## Loading required package: Matrix
## 
## Attaching package: 'Matrix'
## The following object is masked from 'package:tidyr':
## 
##     expand
library(modelr)

Running the model

Next, run the model and take random trial effects into account. Since a gaussian glm fit our data the best out of all the glms we ran, we will assume a gaussian family for glmm as well.

Run several models including different combinations of variables to see which fits the data best.

glmm1 <- glmer(Zresponse ~ -1 + odour + conc + (1|trial) + (1|odour:conc), data = gucci, family = gaussian)
## Warning in glmer(Zresponse ~ -1 + odour + conc + (1 | trial) + (1 |
## odour:conc), : calling glmer() with family=gaussian (identity link) as a
## shortcut to lmer() is deprecated; please call lmer() directly
## Warning: Dropping 160 rows with missing values

## Warning: Dropping 160 rows with missing values

## Warning: Dropping 160 rows with missing values
glmm2 <- glmer(Zresponse ~ -1 + odour + (1|trial), data = gucci, family = gaussian)
## Warning in glmer(Zresponse ~ -1 + odour + (1 | trial), data = gucci, family
## = gaussian): calling glmer() with family=gaussian (identity link) as a
## shortcut to lmer() is deprecated; please call lmer() directly

## Warning in glmer(Zresponse ~ -1 + odour + (1 | trial), data = gucci, family
## = gaussian): Dropping 160 rows with missing values

## Warning in glmer(Zresponse ~ -1 + odour + (1 | trial), data = gucci, family
## = gaussian): Dropping 160 rows with missing values

## Warning in glmer(Zresponse ~ -1 + odour + (1 | trial), data = gucci, family
## = gaussian): Dropping 160 rows with missing values
glmm3 <- glmer(Zresponse ~ -1 + odour + conc + (1|trial), data = gucci, family = gaussian)
## Warning in glmer(Zresponse ~ -1 + odour + conc + (1 | trial), data =
## gucci, : calling glmer() with family=gaussian (identity link) as a shortcut
## to lmer() is deprecated; please call lmer() directly

## Warning in glmer(Zresponse ~ -1 + odour + conc + (1 | trial), data =
## gucci, : Dropping 160 rows with missing values

## Warning in glmer(Zresponse ~ -1 + odour + conc + (1 | trial), data =
## gucci, : Dropping 160 rows with missing values

## Warning in glmer(Zresponse ~ -1 + odour + conc + (1 | trial), data =
## gucci, : Dropping 160 rows with missing values
glmm4 <- glmer(Zresponse ~ -1 + odour + conc + gb + (1|trial), data = gucci, family = gaussian)
## Warning in glmer(Zresponse ~ -1 + odour + conc + gb + (1 | trial), data =
## gucci, : calling glmer() with family=gaussian (identity link) as a shortcut
## to lmer() is deprecated; please call lmer() directly

## Warning in glmer(Zresponse ~ -1 + odour + conc + gb + (1 | trial), data =
## gucci, : Dropping 160 rows with missing values

## Warning in glmer(Zresponse ~ -1 + odour + conc + gb + (1 | trial), data =
## gucci, : Dropping 160 rows with missing values

## Warning in glmer(Zresponse ~ -1 + odour + conc + gb + (1 | trial), data =
## gucci, : Dropping 160 rows with missing values
glmm5 <- glmer(Zresponse ~ -1 + odour * conc + (1|trial), data = gucci, family = gaussian)
## Warning in glmer(Zresponse ~ -1 + odour * conc + (1 | trial), data =
## gucci, : calling glmer() with family=gaussian (identity link) as a shortcut
## to lmer() is deprecated; please call lmer() directly

## Warning in glmer(Zresponse ~ -1 + odour * conc + (1 | trial), data =
## gucci, : Dropping 160 rows with missing values

## Warning in glmer(Zresponse ~ -1 + odour * conc + (1 | trial), data =
## gucci, : Dropping 160 rows with missing values

## Warning in glmer(Zresponse ~ -1 + odour * conc + (1 | trial), data =
## gucci, : Dropping 160 rows with missing values
## fixed-effect model matrix is rank deficient so dropping 13 columns / coefficients
plot(glmm1)

plot(glmm2)

plot(glmm3)

plot(glmm4)

plot(glmm5)

grid2 <- gucci %>% 
   data_grid(data= gucci, odour,conc) %>% 
   add_predictions(glmm2)

ggplot(gucci, aes(x = conc)) + 
  geom_jitter(aes(y = Zresponse)) +
  geom_boxplot(data = grid2, aes(y = pred), colour = "purple", size = 0.2)+
  facet_wrap (~odour)
## Warning: Removed 160 rows containing missing values (geom_point).

grid3 <- gucci %>% 
   data_grid(data= gucci, odour,conc) %>% 
   add_predictions(glmm3)

ggplot(gucci, aes(x = conc)) + 
  geom_jitter(aes(y = Zresponse)) +
  geom_boxplot(data = grid3, aes(y = pred), colour = "red", size = 0.2)+
  facet_wrap (~odour)
## Warning: Removed 160 rows containing missing values (geom_point).

grid4 <- gucci %>% 
   data_grid(data= gucci, odour,conc) %>% 
   add_predictions(glmm4)

ggplot(gucci, aes(x = conc)) + 
  geom_jitter(aes(y = Zresponse)) +
  geom_boxplot(data = grid4, aes(y = pred), colour = "orange", size = 0.2)+
  facet_wrap (~odour)
## Warning: Removed 160 rows containing missing values (geom_point).

grid5 <- gucci %>% 
   data_grid(data= gucci, odour,conc) %>% 
   add_predictions(glmm5)

ggplot(gucci, aes(x = conc)) + 
  geom_jitter(aes(y = Zresponse)) +
  geom_boxplot(data = grid5, aes(y = pred), colour = "blue", size = 0.2)+
  facet_wrap (~odour)
## Warning: Removed 160 rows containing missing values (geom_point).

Test, using an ANOVA, which model best fits the data

anova(glmm1, glmm2, glmm3, glmm4, glmm5)
## refitting model(s) with ML (instead of REML)
## Data: gucci
## Models:
## glmm2: Zresponse ~ -1 + odour + (1 | trial)
## glmm3: Zresponse ~ -1 + odour + conc + (1 | trial)
## glmm1: Zresponse ~ -1 + odour + conc + (1 | trial) + (1 | odour:conc)
## glmm4: Zresponse ~ -1 + odour + conc + gb + (1 | trial)
## glmm5: Zresponse ~ -1 + odour * conc + (1 | trial)
##       Df    AIC    BIC  logLik deviance   Chisq Chi Df Pr(>Chisq)    
## glmm2 18 970.31 1018.8 -467.15   934.31                              
## glmm3 21 972.64 1029.2 -465.32   930.64  3.6719      3    0.29914    
## glmm1 22 974.64 1033.8 -465.32   930.64  0.0000      1    1.00000    
## glmm4 22 974.52 1033.7 -465.26   930.52  0.1148      0    < 2e-16 ***
## glmm5 53 993.35 1136.0 -443.67   887.35 43.1760     31    0.07176 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Back to top